Team name - Cryptonite

Topic - Cryptocurrency Analysis

Team Members -

Aditya D (BL.EN.U4CSE19030)

Avinash S (BL.EN.U4CSE19010)

Karan Singh (BL.EN.U4CSE19062)

Introduction

This report is a case study of the cryptocurrency market from 2013-2018.
We analyze different factors that influence the price of various cryptocurrencies.
In the end we try to implement a regression model to predict the price of a coin, based on it's previous values.

Mapping Mismatched data

Mapping through prefix and replace functions

Mapping through observation

Data to be removed

Duplicates

From the above 8 data points, we know that we'll be getting rid of the below 3.
Lets try finding whether the other ones have any more info related to symbol etc.

In hempcoin, we can clearly makeout that the 2nd one maps to hempcoinhmp through the symbol.
In enigma, we'll have to make an assumption that the 2nd one maps to hempcoinproject.

Data Cleaning

All currencies dataset

Metadata

From the above dataset, we can tell that name could potentially act as the index of the dataset.
To do the same, we need to make a few adjustments as seen from the mapping notebook.

Name changes

Now, we can append the same back to the original dataset.

Now that the necessary changes to name have been done, we can set it as the index.

Missing Values

We can choose to drop data with more than 2 missing values and apply filler techniques for the rest.

Leaving market_cap and circulating_supply, the number of missing values in the other features is quite low.

Here, we can see that there is high correspondence of nullity between market_cap and circulating_supply.
For now lets fill all the other features with their respective means.

For now we can ignore the price feature as the 2 data points will eventually be dropped eitherways.

Now that NaNs are dealt with for the other features, we can look into circulating_supply and market_cap.
Both these features are related through an equation that goes like -
$Market Cap = Current Price * Circulating Supply.$
It can be used to fill the NaNs in circulating_supply.
Next, we can drop the common null data points between the two.

We've gotten rid of 360 data points and applied filler methods for the remaining NaNs.
Lets take a look at the bar chart to see if they're any more remaining.

Now that we've gotten rid of all the NaNs, We can look to rename our features for ease of access during EDA.

Renaming Feature Vectors

CryptoCurrency Prices by Date Dataset

For this dataset, we could start off by bringing the date back to a usual format.

Date

Renaming Feature Vectors

Next up, lets rename our features for ease of access.

Name Changes

We need to make some changes to the coin names.

Lastly, we can drop data that doesn't have corresponding data back in the first dataset.

Even though this dataset is quite long, there seems to be no better way to organize the same.

Implementation

Outliers

In the above dataset, we can see data about 1142 different cryptocurrencies at a static point in time (i.e. 02/09/2018 12:10pm).
Which makes all the datapoints unrelated except for the fact that all of them represent a certain cryptocurrency.
Thus one could say that all this data picked feature-wise would be discrete data.

In this dataset, we can see how price changes over a certain period of time where the cryptocurrency exists.
Our time period of interest being 2014-2018 as that is when the cryptocurrency market boomed.

Appropriate Features

We select features from which we may have to remove outliers to continue our analysis efficiently.
We only select the features that are not spanning over a period of time and ignore the ones like daily trend as they are very volatile.

MarketCap

In the above Violin plot, we can autoscale to see that there are 4 outliers namely Bitcoin, Ethereum, Ripple and Bitcoincash.
We observe a relatively high Market Capitalization as these cryptocurrencies were the pioneers in the cryptocurrency market.

Price

In the above Violin plot, we can again autoscale to see that there are 3 outliers namely Bit20, Projectx and 42coin.
These cryptocurrencies are relatively more expensive than all the other ones in the market.

Circulating Supply

Finally, after autoscaling the above Violin plot, we can observe that there are 5 outliers namely Sprouts, Paccoin, Kin, Dimecoin and Fedoracoin.

Data Visualizations and Analysis

In the above dataset, we can see data about 1142 different cryptocurrencies at a static point in time (i.e. 02/09/2018 12:10pm).
Which makes all the datapoints unrelated except for the fact that all of them represent a certain cryptocurrency.
Thus one could say that all this data picked feature-wise would be discrete data.

In this dataset, we can see how price changes over a certain period of time where the cryptocurrency exists.
Our time period of interest being 2014-2018 as that is when the cryptocurrency market boomed.

These are all the observed outliers based on the first three features.
And two observational ones from the trend visualizations which are unityingot and ecoin because of their magnanimous change percentages.

Single Variable Analysis

MarketCap

Price

Circulating Supply

After the removal of the 12 outliers, our data looks quite similar even though a bit more spread out.
This is because there will always be values that easily dominate the rest because of the pareto principle.

Price Percentage Changes Bar chart

From this graph, we can see, that on an hourly basis, the market is very bearish, since we have more red bars than green.
However on a daily basis, the market is extremely bullish, since we have way more greens than reds and there is a lot of strength in the trend.
On a weekly basis, we have almost equal number of reds and greens, but the greens are way taller than the reds.
This implies that the market is following a slightly weak bullish trend.

Profit Loss Pie charts

These pie charts shows us the collective nature of the entire market.
We can see the percentage of Cryptocurrencies gaining a profit or going into a loss.

Volatility

In the above code cell, we calculate the volatility of each cryptocurrency.
We calculate it by taking the standard deviation of the price over their respective active time period.

Multi Variable Analysis

Correlation Matrix Heatmap

Price vs Volatility

Here, as we can observe the pattern between the price and volatility, the majority of the data is linearly related with the representation roughtly showing (2y = x). But there are exceptions, mostly the outliers with price value greater than 4k which dont fit in to this linear pattern.There are many possible factors why volatility is proportional to price. Popularity is a highly possible reason. Cryptocurrency markets are open 24-7. Therefore there are people all across the globe trading cryptocurrencies at almost all times of the day, depending on their time-zones. So the more the popular a cryptocurrency is, the more it will be traded by people. Which means, we have more investors in the popular currencies( like BTC and ETH), making them more expensive.

Market Capitalization vs Volume

In this graph of Market cap. and Price, we can observe the datapoints grouped towards the origin of the graph. on having a overall look including the outliers this looks like a linear pattern with very low slope. But if we observe the grouped data points the pattern is vsisble with a linear line going parallel to x-axis.

The reason of this pattern is that most of our cryptocurrencies lie between 0 and 0.5 billion and contains a varity of market caps ranging from some thousands to millions, thus showing a stright line parallel to x axis.

Daily vs Weekly

Daily and Weekly had a low, correlation. This is vissible in the scatter plot above. The only reason that we see a slight correlation is that, our data has the time period when cryptocurrencies became very popular, and the markets became very bullish. Hence both weekly trends and daily trends are bullish in nature, which explain the slight correlation.

Summary

1.Volatility and price of a crypto currency are highly correlated because both independently depend on the popularity of the coin.

2.Volume and Market Cap. have a high correlation, as the formula for Market Cap. is (Volume * Price).

3.The time stamp at which the data was collected is having a bearish performance on hourly rate, a very bullish performance in daily rate and almost balanced performance in the weekly rate.

4.Both weekly trends and daily trends are weakly correlated, because the overall market trend is bullish.

Machine Learning Algorithm

Results Obtained

References

Pareto Distribution -> https://www.tuannguyen.tech/2020/08/discussion-pareto-distribution/
Volatility -> https://seic.com/sites/default/files/inline-files/SEI_Standard-Deviation_UK.pdf
Pandas -> https://pandas.pydata.org/docs/reference
Plotting -> https://plotly.com/python/
Scikit-learn -> https://scikit-learn.org/stable/